A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
نویسنده
چکیده
In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization. Specifically, we utilized documents of multi-participant chat from a well known online instant messaging application, WhatsApp. We construct the gold standard by asking three native speakers to manually summarize 300 chat sections (152 of them contain images). As result, three reference summaries in extractive and either abstractive form are produced for each chat sections. The corpus is still in its early stage of investigation, yielding exciting possibilities of future works.
منابع مشابه
Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries
We propose a novel abstractive querybased summarization system for conversations, where queries are defined as phrases reflecting a user information needs. We rank and extract the utterances in a conversation based on the overall content and the phrasal query information. We cluster the selected sentences based on their lexical similarity and aggregate the sentences in each cluster by means of ...
متن کاملFrom Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?
Most previous studies on meeting summarization have focused on extractive summarization. In this paper, we investigate if we can apply sentence compression to extractive summaries to generate abstractive summaries. We use different compression algorithms, including integer linear programming with an additional step of filler phrase detection, a noisychannel approach using Markovization formulat...
متن کاملAbstractive Document Summarization with a Graph-Based Attentional Neural Model
Abstractive summarization is the ultimate goal of document summarization research, but previously it is less investigated due to the immaturity of text generation techniques. Recently impressive progress has been made to abstractive sentence summarization using neural models. Unfortunately, attempts on abstractive document summarization are still in a primitive stage, and the evaluation results...
متن کاملExtractive vs. NLG-based Abstractive Summarization of Evaluative Text: The Effect of Corpus Controversiality
Extractive summarization is the strategy of concatenating extracts taken from a corpus into a summary, while abstractive summarization involves paraphrasing the corpus using novel sentences. We define a novel measure of corpus controversiality of opinions contained in evaluative text, and report the results of a user study comparing extractive and NLG-based abstractive summarization at differen...
متن کاملHow Many Words Is a Picture Worth? Automatic Caption Generation for News Images
In this paper we tackle the problem of automatic caption generation for news images. Our approach leverages the vast resource of pictures available on the web and the fact that many of them are captioned. Inspired by recent work in summarization, we propose extractive and abstractive caption generation models. They both operate over the output of a probabilistic image annotation model that prep...
متن کامل